Goto

Collaborating Authors

 loss component


Scalable Decision-Focused Learning through Cost-Sensitive Regression

arXiv.org Machine Learning

Many real-world combinatorial problems involve uncertain parameters, which can be predicted given contextual features and historical data. These `predict-then-optimize' or `contextual optimization' problems have gained significant attention: end-to-end training methods can now minimize the downstream task cost rather than the predictive error. However, despite their effectiveness, these decision-focused learning (DFL) approaches often rely on repeated solving of the underlying combinatorial optimization problem during training, making them computationally expensive and difficult to scale. We reframe the learning problem as a cost-sensitive multi-output regression problem: multi-output due to the combinatorial problem having multiple uncertain parameters, and cost-sensitive due to the downstream task cost being the real target. Our technical contribution is the formalization of multiple loss function components that follow from this reframing: cost-insensitive normalization, decision-aware asymmetric penalization of over- and underpredictions, and instance-based costs that mimic the true downstream task-based loss locally. These components require zero or one solve per training data instance, while requiring no further solves during training. Experiments show that the combination of loss components achieves comparable downstream task quality to the state of the art, while being significantly more efficient, enabling scaling to problem sizes that have not been tackled before with DFL.





Uncertainty-Resilient Multimodal Learning via Consistency-Guided Cross-Modal Transfer

arXiv.org Artificial Intelligence

Multimodal learning systems often face substantial uncertainty due to noisy data, low-quality labels, and heterogeneous modality characteristics. These issues become especially critical in human-computer interaction settings, where data quality, semantic reliability, and annotation consistency vary across users and recording conditions. This thesis tackles these challenges by exploring uncertainty-resilient multimodal learning through consistency-guided cross-modal transfer. The central idea is to use cross-modal semantic consistency as a basis for robust representation learning. By projecting heterogeneous modalities into a shared latent space, the proposed framework mitigates modality gaps and uncovers structural relations that support uncertainty estimation and stable feature learning. Building on this foundation, the thesis investigates strategies to enhance semantic robustness, improve data efficiency, and reduce the impact of noise and imperfect supervision without relying on large, high-quality annotations. Experiments on multimodal affect-recognition benchmarks demonstrate that consistency-guided cross-modal transfer significantly improves model stability, discriminative ability, and robustness to noisy or incomplete supervision. Latent space analyses further show that the framework captures reliable cross-modal structure even under challenging conditions. Overall, this thesis offers a unified perspective on resilient multimodal learning by integrating uncertainty modeling, semantic alignment, and data-efficient supervision, providing practical insights for developing reliable and adaptive brain-computer interface systems.


LLavaCode: Compressed Code Representations for Retrieval-Augmented Code Generation

arXiv.org Artificial Intelligence

Retrieval-augmented generation has emerged as one of the most effective approaches for code completion, particularly when context from a surrounding repository is essential. However, incorporating context significantly extends sequence length, leading to slower inference - a critical limitation for interactive settings such as IDEs. In this work, we introduce LlavaCode, a framework that compresses code into compact, semantically rich representations interpretable by code LLM, enhancing generation quality while reducing the retrieved context to only a few compressed single-token vectors. Using a small projector module we can significantly increase the EM and ES metrics of coding model with negligible latency increase. Our experiments demonstrate that compressed context enables 20-38% reduction in Time-to-First-Token (TTFT) on line completion tasks compared to full-RAG pipelines.



Impact of Loss Weight and Model Complexity on Physics-Informed Neural Networks for Computational Fluid Dynamics

arXiv.org Artificial Intelligence

Physics Informed Neural Networks offer a mesh free framework for solving PDEs but are highly sensitive to loss weight selection. We propose two dimensional analysis based weighting schemes, one based on quantifiable terms, and another also incorporating unquantifiable terms for more balanced training. Benchmarks on heat conduction, convection diffusion, and lid driven cavity flows show that the second scheme consistently improves stability and accuracy over equal weighting. Notably, in high Peclet number convection diffusion, where traditional solvers fail, PINNs with our scheme achieve stable, accurate predictions, highlighting their robustness and generalizability in CFD problems.


Stabilizing Humanoid Robot Trajectory Generation via Physics-Informed Learning and Control-Informed Steering

arXiv.org Artificial Intelligence

Recent trends in humanoid robot control have successfully employed imitation learning to enable the learned generation of smooth, human-like trajectories from human data. While these approaches make more realistic motions possible, they are limited by the amount of available motion data, and do not incorporate prior knowledge about the physical laws governing the system and its interactions with the environment. Thus they may violate such laws, leading to divergent trajectories and sliding contacts which limit real-world stability. We address such limitations via a two-pronged learning strategy which leverages the known physics of the system and fundamental control principles. First, we encode physics priors during supervised imitation learning to promote trajectory feasibility. Second, we minimize drift at inference time by applying a proportional-integral controller directly to the generated output state. We validate our method on various locomotion behaviors for the ergoCub humanoid robot, where a physics-informed loss encourages zero contact foot velocity. Our experiments demonstrate that the proposed approach is compatible with multiple controllers on a real robot and significantly improves the accuracy and physical constraint conformity of generated trajectories.